deep learning compiler
Review for NeurIPS paper: Fast geometric learning with symbolic matrices
Relation to Prior Work: The authors both discuss the implementation differences with and compare the performance of their library to strong baselines in many different application areas. Their results are impressive, especially given that some of the baselines are heavily optimized for specific problems (e.g. I'm wondering if PyTorch-Geometric's main competitor DGL should be an additional comparison point for the geometric deep learning benchmarks; I think it's often faster in practice although it may be too specialized for these architectures. I would like to see more discussion of the similarities and differences between your implementation and deep learning compilers like XLA and TVM. For instance, does your package do just-in-time CUDA code generation/compilation or perform operator fusion?
New Major Release for Nebullvm Speeds Up AI Inference by 2-30x
Nebuly is very excited to announce the new major release nebullvm 0.3.0, Nebullvm is an open-source library that generates an optimized version of your deep learning model that runs 2 to 10 times faster in inference without performance loss by leveraging multiple deep learning compilers (OpenVINO, TensorRT, ONNX Runtime, TVM, etc.). This additional acceleration is achieved by exploiting optimization techniques that slightly modify the model graph to make it lighter, such as quantization, half precision, distillation, sparsity, etc. Find tutorials and examples on how to use nebullvm, as well as installation instructions in the main readme of nebullvm library. It takes a few lines of code to install the library and optimize your models. The library now works on most CPU and GPU and will soon support TPU and other deep learning-specific ASIC.
An End-to-End HW/SW Co-Design Methodology to Design Efficient Deep Neural Network Systems using Virtual Models
Klaiber, Michael J., Vogel, Sebastian, Acosta, Axel, Korn, Robert, Ecco, Leonardo, Back, Kristine, Guntoro, Andre, Feldner, Ingo
End-to-end performance estimation and measurement of deep neural network (DNN) systems become more important with increasing complexity of DNN systems consisting of hardware and software components. The methodology proposed in this paper aims at a reduced turn-around time for evaluating different design choices of hardware and software components of DNN systems. This reduction is achieved by moving the performance estimation from the implementation phase to the concept phase by employing virtual hardware models instead of gathering measurement results from physical prototypes. Deep learning compilers introduce hardware-specific transformations and are, therefore, considered a part of the design flow of virtual system models to extract end-to-end performance estimations. To validate the run-time accuracy of the proposed methodology, a system processing the DilatedVGG DNN is realized both as virtual system model and as hardware implementation. The results show that up to 92 % accuracy can be reached in predicting the processing time of the DNN inference.
- Europe > Germany (0.07)
- North America > United States > New York > New York County > New York City (0.06)
- Asia (0.04)